OCR-Free Document Understanding Transformer

نویسندگان

چکیده

Understanding document images (e.g., invoices) is a core but challenging task since it requires complex functions such as reading text and holistic understanding of the document. Current Visual Document (VDU) methods outsource to off-the-shelf Optical Character Recognition (OCR) engines focus on with OCR outputs. Although OCR-based approaches have shown promising performance, they suffer from 1) high computational costs for using OCR; 2) inflexibility models languages or types documents; 3) error propagation subsequent process. To address these issues, in this paper, we introduce novel OCR-free VDU model named Donut, which stands transformer. As first step research, propose simple architecture (i.e., Transformer) pre-training objective cross-entropy loss). Donut conceptually yet effective. Through extensive experiments analyses, show model, achieves state-of-the-art performances various tasks terms both speed accuracy. In addition, offer synthetic data generator that helps be flexible domains. The code, trained are available at https://github.com/clovaai/donut .

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Beyond OCR: Multi-faceted understanding of handwritten document characteristics

In the previous chapters, we proposed several features for writer identification, historical manuscript dating and localization separately. In this chapter, we present a summarization of the proposed features for different applications by proposing a joint feature distribution (JFD) principle to design novel discriminative features which could be the joint distribution of features on adjacent p...

متن کامل

Beyond OCR: Multi-faceted understanding of handwritten document characteristics

In the previous chapters, we proposed several features for writer identification, historical manuscript dating and localization separately. In this chapter, we present a summarization of the proposed features for different applications by proposing a joint feature distribution (JFD) principle to design novel discriminative features which could be the joint distribution of features on adjacent p...

متن کامل

Beyond OCR: Multi-faceted understanding of handwritten document characteristics

In the previous chapters, we proposed several features for writer identification, historical manuscript dating and localization separately. In this chapter, we present a summarization of the proposed features for different applications by proposing a joint feature distribution (JFD) principle to design novel discriminative features which could be the joint distribution of features on adjacent p...

متن کامل

Beyond OCR: Multi-faceted understanding of handwritten document characteristics

In the previous chapters, we proposed several features for writer identification, historical manuscript dating and localization separately. In this chapter, we present a summarization of the proposed features for different applications by proposing a joint feature distribution (JFD) principle to design novel discriminative features which could be the joint distribution of features on adjacent p...

متن کامل

Beyond OCR: Multi-faceted understanding of handwritten document characteristics

In the previous chapters, we proposed several features for writer identification, historical manuscript dating and localization separately. In this chapter, we present a summarization of the proposed features for different applications by proposing a joint feature distribution (JFD) principle to design novel discriminative features which could be the joint distribution of features on adjacent p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2022

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-031-19815-1_29